搜索资源列表
DataSets
- 文本聚类用到的数据集,国外人提供的用于聚类算法-dataset,EMforSoftKmeans dataset Binary_1 Binary_2
textcluster
- 基于KMeans的文本聚类算法,支持文本输入,简单易懂-KMeans clustering algorithm based on text, support for text input, easy to understand
wenbenleiju
- 基于文本相似度计算的文本聚类算法研究与实现,这是中文信息处理的重要分支。-The text clustering algorithm based on text similarity computing research and implementation, this is an important branch of Chinese information processing.
toolkit_for_words_En
- 处理英文中的停词、同词干词,不改变文章结构。适用于文本分类、文本聚类、推荐预处理。-Processing of stop words in English, with the stem word, does not change the structure of the article. Suitable for text categorization, text clustering, recommend pretreatment.
cluster
- python语言实现k-means算法和Fast Search And Find Of Density Peaks算法用于文本聚类,-python language implements k-means algorithm and Fast Search And Find Of Density Peaks for text clustering algorithm,
maxent-master
- 最大熵模型算法,用于统计学习,文本分类,文本聚类研究-The maximum entropy model algorithm for statistical learning, text classification, text clustering research
textcluster
- 实现文本聚类,初级使用者可以作为参考进行文本聚类知识的辅助学习-The realization of text clustering, primary users can be used as reference for text clustering knowledge assisted learning
kmeans
- k-means算法是文本聚类经典算法,也是数据挖掘十大经典算法之一。k-means算法Java实现。-k-means algorithm is a classical algorithm text clustering, data mining is one of the ten classic algorithms. k-means algorithm is implemented in Java.
textcluster
- java版的k-means算法,实现文本聚类功能-the k-means algorithm in java
datamining
- PDF格式的PPT,来自英国南安普顿大学。主要介绍了数据挖掘的技术以及应用,包括决策树,推荐系统,文本聚类,搜索引擎,购物篮子分析。-PPT PDF format, the University of Southampton. It introduces data mining technology and applications, including decision, recommendation systems, text clustering, search engines, sho
DataStructTest
- K-means文本聚类方法(IDEA项目包) 下载就能运行-K-means clustering method text (IDEA project package) will be able to download Run
Text-clustering
- 机器学习中文本聚类算法,里面有5个文件,包含Python实现代码和测试数据。-The clustering algorithm machine learning Chinese, there are five files that contain Python implementation code and test data.
Large-scale-text-clustering-master
- java 实现文本聚类 java 实现文本聚类 -the code of text clustering the code of text clustering the code of text clustering
Kmeans-master
- 本程序使用java代码实现一个文本聚类操作,采用的方法是kmens-a simple code of text clustering using kmeans
cluster
- 提出了一种基于语义内积空间模型的文本 聚类算法. -Text proposed clustering algorithm within the semantic model based on the product space.
words_1025_dic.txt
- dbscan,暂时不要下载,有误,回头整理(dbscan and word2vec for chinese words)
Kmeans
- 算法思想:提取文档的TF/IDF权重,然后用余弦定理计算两个多维向量的距离来计算两篇文档的相似度,用标准的k-means算法就可以实现文本聚类。源码为java实现(Algorithm idea: extract the TF/IDF weight of the document, then calculate the distance between two multidimensional vectors by cosine theorem, calculate the similarity
English
- 包括了原始英文文档、删除特殊符号、分词、词干化、计算相似度等文本预处理后产生的文档,总的数量是500个英文文档(Including the original English document, delete special symbols, such as text segmentation, a preprocessed documents produced, the total number of 500 English document)
EnglishChuLi
- 利用python编写的文本预处理的程序,包含了每一步的实现代码,分为删除标点符号、删除停用词、相似度计算、PCA降维、聚类以及可视化等,运行环境为pytharm,python3开发环境(The text preprocessing program written by Python contains every step of implementation code, which is divided into delete punctuation marks, delete stop word
ChineseChuLi
- 中文文本处理的python程序,包括分词、删除特殊字符、删除停用词、爬虫程序、PCA降维、Kmean聚类、可视化等(Python programs for Chinese text processing, including participle, deleting special characters, deleting disuse words, crawler programs, PCA dimensionality reduction, Kmean clustering, visuali